-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wasm32 simd128 intrinsics #549
Conversation
cc @sunfish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome to me, thanks so much for taking this on @gnzlbg!
I haven't thoroughly audited the implementations yet, but here's some thoughts I've got:
-
In general wasm is very different from x86 in that there's not really well-established prior art about how these intrinsics should be exposed or things like C signatures/names we should apply. We do, however, have a very well defined instruction set in the wasm standard. These instructions are well typed and easily map to functions, so I think it's worthwhile (and stabilizable!) to map directly to the instructions in the standard, using straightforward mappings to Rust idioms (as you've done here) where possible.
-
Instead of
pub struct i64x2;
and such, perhaps it could bepub mod i64x2
? I don't think we needi64x2
to be a type, right? I really like the idea though of having the::
correspond to the.
in wasm. It would also mean that we may want to consider renaming the memory intrinsics to their new names which havemem.*
in the name (food for thought) -
I like how all the functions are named precisely after the name of the instruction, that works well.
-
The
#[assert_instr]
macro isn't working here but I don't think it'll be too too hard to get it working withwasm-dis
, I believe that's got all the necessary support for the SIMD instructions. I don't think that should block the initial implementation though, I can work on getting that done after this lands (when I get some time) -
I'm surprised actually that
asm!
works for the wasm target! -
Unlike x86, I don't think we can do runtime detection at all on wasm. Entire wasm modules are either valid or invalid, so there's no way to have some functions with SIMD not get used if the browser doesn't support SIMD. In that sense runtime detection must be done prior to instantiation in JS and then an appropriate module must be loaded, and projects which want to target browsers that do/don't use SIMD will need multiple builds of their crate
Now this is super tricky with the standard library because the standard library won't enable this feature by default. That being said, I think it's very likely that we ship a second wasm standard library target which enables this (if it becomes popular) or we make it super easy to work with xargo making this a moot point.
tl;dr; the
#[cfg]
on all functions is unfortunate but I believe that the wasm32 architecture simply requires it, we have no other choice
coresimd/wasm32/simd128.rs
Outdated
/// from `v1` when `1` and `v2` when `0`. | ||
#[inline] | ||
// #[target_feature(enable = "simd128")] | ||
// FIXME: #[cfg_attr(test, assert_instr($id.bitselectnot))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/bitselectnot/biselect/
coresimd/wasm32/simd128.rs
Outdated
// #[target_feature(enable = "simd128")] | ||
// FIXME: #[cfg_attr(test, assert_instr($id.bitselectnot))] | ||
pub unsafe fn bitselect(v1: v128, v2: v128, c: v128) -> v128 { | ||
// FIXME: use llvm.select instead - we need to add a `simd_bitselect` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this FIXME mean that the implementation here is incorrect? If so, should this method perhaps be commented out for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is correct, but I haven't checked that it generates the appropriate v128.bitselect
instruction. I only expect LLVM to do that if this gets optimized into an llvm.select
, so I think we should just be generating that in the first place.
coresimd/wasm32/simd128.rs
Outdated
union U { | ||
vec: self::sealed::$ivec_ty, | ||
a: v128 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is there a particular reason to use unions instead of transmute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought / think that we can make these const_fn
eventually, so unions here are forward compatible with that.
coresimd/wasm32/simd128.rs
Outdated
// #[target_feature(enable = "simd128")] | ||
// FIXME: #[cfg_attr(test, assert_instr(v128.const, imm = [ImmByte::new(42); 16]))] | ||
#[rustc_args_required_const(0)] | ||
pub const unsafe fn const_(imm: [ImmByte; 16]) -> v128 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused, how come this requires ImmByte
? Couldn't this be [u8; 16]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just what the type in the WASM spec is called. I didn't wanted to start adding rules like "ImmByte
is always translated to u8
" but given that that's what I actually ended up doing while testing (ImmByte::new()
is a pain), we might just as well add that rule. Maybe the same for the LaneIdx
types.
Should we keep these as type aliases for consistency with the spec? E.g.
pub type ImmByte = u8;
pub type LaneIdx... = u8;
....
?
coresimd/wasm32/simd128.rs
Outdated
} | ||
}; | ||
} | ||
impl_laneidx!(LaneIdx2(u8): [0, 1] | /// A byte with values in the range 0–1 identifying a lane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be really hard for us to stabilize these LandIdx*
types unfortunately, would it be possible though to say it's UB to specify an out of bounds index and just take usize
for indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be really hard for us to stabilize these LandIdx* types unfortunately,
Why do you say so?
would it be possible though to say it's UB to specify an out of bounds index and just take usize for indices?
Duh, the docs don't say it but stating that if the value is out-of-bounds the behavior is undefined was the intent.
I went with the new types because it felt more conservative, and it should allow us to insert debug asserts in the future (undefined behavior here means we can do anything, including properly diagnosing the error at run-time).
Having been using these a bit while running the tests, I always just ended up transmuting u8
s anyways, so I am myself unconvinced that this is worth it. (the spec says these are byte sized, so I think it makes sense to just make these u8
instead of usize
here)
If we make them newtypes these should all be #[repr(transparent)]
though as @cramertj pointed out.
Yes, typically the platform vendor includes headers for this for their compilers, and those become the C API. I don't think Clang and GCC are there yet beyond having some
Stabilizable, maybe some day. Right now I mostly wanted to be sure that our toolchain can handle these, that these are actually implementable (the spec was a bit ambiguous in some places), and that we can use these in
So this is a "chronological bug". I originally wanted the syntax to be the same as in the webassembly spec, that would mean I tried modules, but a minor inconvenience is that I cannot reopen a module from the one it is defined is to add more stuff to it, like this: // mod parent
mod child { ...stuff... }
mod child { ...more stuff... } // ERROR This makes it hard to incrementally grow the contents of the module with macros that are decoupled from one another (is it worth it to fill an issue for this or open an internal post?). Types makes this easy: pub struct Foo;
impl Foo { ...stuff... }
impl Foo { ...more stuff... } // OK So while I agree that modules would be better, and that we shouldn't put implementor inconvenience over user inconvenience, refactoring the type stuff to the module stuff when everything is implemented is easy, but incrementally growing the module stuff is hard. So I'd prefer to do this when we have resolved all other issues as a final cleanup step. Thinking about C APIs, and using the same names as them, I think we should use Sadly concatenating identifiers with macros is even messier than using modules :/
We could do that exporting deprecated aliases to the old names to avoid breaking code for the time being.
By works you mean that the syntax parses? I am adding a
So I think this is very unfortunate. As we have learned for the ARM ecosystem, this means that every target feature permutation is going to end up being its own "architecture" independent from the others, so with the dozen of WASM extensions in flight, we are going to easily land in the 100s and 1000s of WASM architectures that we are going to have to support as soon as these start releasing 1.1 and 2.0 versions of each ISA extension. Do you know people that could raise these concerns with the WebAssembly working group? WASM should have a way to state that "The following
That combined with an instruction that detects whether the feature is supported or not by the WASM "machine" would be enough. |
So the last commit adds run-time tests for all intrinsics using the The commit changes a couple of things:
|
Summarizing the TODOS:
I am going to focus on getting CI green, and once that is done, the only blocker for this PR is the last point above ( |
608bfe5
to
0bd5460
Compare
@gnzlbg I've pushed up a commit which runs wasm tests like all the other tests (via docker and I've commented it out currently here -- https://github.com/gnzlbg/stdsimd/blob/608bfe588bd8b4b258f4bef50d7122489d05ac4d/ci/run.sh#L43 |
Oh oops looks like I should have told you first! In any case the commit gnzlbg@608bfe5 should put wasm on track to look like all the other targets (and be a bit faster to build with precompiled binaries) |
All that makes sense to me, thanks @gnzlbg! For modules vs impls another point for modules is that
I'm not sure! I can try to ask around though. @sunfish you may know more about this though? |
noooo, the build was finally green! :D
Yes, I haven't tried compiling with it enabled yet. I was waiting for it to become properly whitelisted. We would have to get that working for the
That's a good point. I'll do that next then to unblock work on |
371e779
to
2c0ccc3
Compare
One problem with using modules instead of types, is that the module |
So the last commit uses I think that for |
cc @tlively |
cc @sbc100 There seem to be some issues when lowering many of the intrinsics when |
I think @tlively will be more useful than me here. I don't know much about the simd in the backend. Are you using some flag to enable the simd wam instructions? Are you generating the simd with intrinsics or via explicit simd types, or via auto-vectorization? |
Currently most of the intrinsics via explicit simd types + generic operations on those, e.g. splat appears to lower to a shufflevector: define hidden void @i8x16_splat(<1 x i128>* noalias nocapture sret dereferenceable(16), i8 %v) {
start:
%_3.0.vec.insert.i = insertelement <16 x i8> undef, i8 %v, i32 0
%_3.15.vec.insert.i = shufflevector <16 x i8> %_3.0.vec.insert.i, <16 x i8> undef, <16 x i32> zeroinitializer
%1 = bitcast <16 x i8> %_3.15.vec.insert.i to i128
%_2.sroa.0.0.vec.insert.i = insertelement <1 x i128> undef, i128 %1, i32 0
store <1 x i128> %_2.sroa.0.0.vec.insert.i, <1 x i128>* %0, align 16
ret void
} but I could use an Looking at the IR, we see that I am currently representing All intrinsics are exercised at run-time and pass when |
We've really only just started working on SIMD128 support in the wasm LLVM backend despite how long the proposal has been around, so I wouldn't expect anything to work for a while under the simd128 feature flag. Once we have it in a more workable state, having feedback from the Rust side will be very helpful. |
Cargo.toml
Outdated
@@ -3,6 +3,9 @@ members = [ | |||
"crates/stdsimd-verify", | |||
"crates/stdsimd", | |||
] | |||
exclude = [ | |||
"crates/wasm-test" | |||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can probably be removed now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
It's impressive that LLVM's backend is getting these tests passing as is already! (even without simd128 enabled) @gnzlbg when I get a chance I'll start working on |
@alexcrichton I would expect that SIMD intrinsics and types are currently being scalarized by LLVM, so performance may be worse than expected, but it's nice to hear that at least it compiles! |
Ok I was itching a bit to get this done, so I've now gotten it done! I've pushed up support for the To see failures locally, you'll need a patch like this (ish), to generate: https://gist.github.com/alexcrichton/4809bfb8a818621684307ad368b71da4#file-out-log-L324-L621 Progress! Although rust-lang/rust#53179 just landed if you use |
@alexcrichton I've added a crate to |
I've pushed up a commit which uses |
If some intrinsics work with |
@alexcrichton We should put the Otherwise we are going to keep this PR alive on the side. Once |
@gnzlbg sounds good to me! If you want to add that in and configure CI to test it I'd be cool merging |
will do |
* Shell out to Node's `execSync` to execute `wasm2wat` over our wasm file * Parse the wasm file line-by-line, looking for various function markers and such * Use the `elem` section to build a function pointer table, allowing us to map exactly from function pointer to a function * Avoid losing debug info (the names section) in release mode by stripping `--strip-debug` from `rust-lld`.
Looks great! I think the |
I just removed it, we'll see. |
👍 |
This adds all wasm32 simd128 intrinsics with the exception of integer saturating arithmetic.
Once rust-lang/rust#53179 lands we can enable the target feature and see what happens.
In the meantime, early reviews are welcome.
Basically, the WASM SIMD extension introduces a single SIMD vector type:
v128
. The intrinsics mostly [0] follow a{interpretation}.operation
naming scheme, where:{interpretation}
specifies how the bytes of the SIMD vector are to be interpreted, e.g., asf32x4
,i64x2
, etc.{operation}
is the operation to be performed, e.g,add
. The semantics of the operation depend on the interpretation of the vector type being used, e.g., floating point addition vs wrapping integer addition.In Rust, these are translated to
{interpretation}::{operation}
, where:interpetation
s are unit structs, e.g.,struct i32x4
operation
s are inherent methods on unit structs, e.g,i32x4::add(v128, v128) -> v128
(recall that the only vector type in the ISA isv128
).And that's pretty much it. I'd say 90% of the ISA is trivial to implement. The problematic bit and pieces are:
v8x16.shuffle
which takes a 128-bit wide immediate mode argument. Each byte of the argument takes values between 0-31, and there are 16 valid bytes, so... we can'tconstify!
our way out of this one. Because thesimd_shuffle32
takes a constant argument, we cannot really use it, so I decided to just use inline assembly and call it a day.saturating arithmetic: rustc doesn't have the intrinsics for this yet. These should be trivial to add, and
packed_simd
needs these as well. So this isn't a big deal, it is just not part of this PR.bitselect
: bitselect interprets av128
as ai1x128
mask, but we can't specify this type usingrepr(simd)
. Thesimd_select
intrinsic is also not what we want here, because it takes arepr(simd)
type likei8x16
and internally truncates it toi1x16
when creating a mask, but if we pass it ai128x1
it would produce ani1x1
. I think we should probably add asimd_bitselect
intrinsic to rustc that just does not perform the truncation - it is necessary for AVX-512 anyways to use ani64
mask asi1x64
saturating conversions: I don't think I have implemented these correctly, but since we can't really test WASM much yet, it is hard to tell
floating-point min / max : we are currently using
minnum
/maxnum
here which depending on which IEEE 754 standard you are on (2008 vs 2018) behave differently, but in general I think t hat they do not propagateNaN
s properly: if one of the two arguments is aNaN
, they always return the other argument, while these should always return theNaN
argument. I don't know what the best way to implement the correct behavior is yet. We could useselect
and/orfcmp
directly or through a rustc instrinsic. Having these other implementations of min/max would be useful:packed_simd
should probably provide bothmin
/max
andminnum
/maxnum
anyways.floating-point neg: I've hacked my way around the
neg
, we should probably use an LLVM intrinsic that just flips the sign bit instead of multiplying by-1.
.The most important issue:
[0] there are some exceptions at the end of the file when it comes to conversions, where
{dst_interpretation}.{operation}/{src_interpretation}{:modifier}
is used.